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In  April  2011,  NASA’s  pioneering  cloud  profiling  radar  satellite,  CloudSat,  experienced  a 
battery  anomaly  that  placed  it  into  emergency  mode  and  rendered  it  operations  incapable.  All 
initial  attempts  to  recover  the  spacecraft  failed  as  the  resultant  power  limitations  could  not 
support  even  the  lowest  power  mode.  Originally  part  of  a  six-satellite  constellation  known  as  the 
“A-Train”,  CloudSat  was  unable  to  stay  within  its  assigned  control  box,  posing  a  threat  to  other 
A-Train  satellites.  CloudSat  needed  to  exit  the  constellation,  but  with  the  tenuous  power  profile, 
conducting  maneuvers  was  very  risky.  The  team  was  able  to  execute  a  complex  sequence  of 
operations  which  recovered  control,  conducted  an  orbit  lower  maneuver,  and  returned  the 
satellite  to  safe  mode,  within  one  65  minute  sunlit  period.  During  the  course  of  the  anomaly 
recovery,  the  team  developed  several  bold,  innovative  operational  strategies.  Details  of  the 
investigation  into  the  root-cause  and  the  multiple  approaches  to  revive  CloudSat  are  examined. 
Satellite  communication  and  commanding  during  the  anomaly  are  presented.  A  radical  new 
system  of  “Daylight  Only  Operations”  (DO-OP)  was  developed,  which  cycles  the  payload  and 
subsystem  components  off  in  tune  with  earth  eclipse  entry  and  exit  in  order  to  maintain  positive 
power  and  thermal  profiles.  The  scientific  methodology  and  operational  results  behind  the 
graduated  testing  and  ramp-up  to  DO-OP  are  analyzed.  In  November  2011,  the  CloudSat  team 
successfully  restored  the  vehicle  to  consistent  operational  collection  of  cloud  radar  data  during 
sunlit  portions  of  the  orbit.  Lessons  learned  throughout  the  six-month  return-to-operations 
recovery  effort  are  discussed  and  offered  for  application  to  other  R&D  satellites,  in  the  context  of 
on-orbit  anomaly  resolution  efforts. 


I.  Introduction:  CloudSat  Mission 

LAUNCHED  in  2006  as  part  of  the  NASA  Earth  System  Science  Pathfinder  (ESSP)  program,  CloudSat’ s 
unique  millimeter-wavelength  radar  provides  scientists  valuable  data  on  the  vertical  profiles  of  condensed 
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water  and  ice  that  make  up  the  structure  of  clouds.  The  scientific  goal  of  the  mission  is  to  build  the  first  global 
survey  of  the  vertical  structure  of  cloud  systems  and  profiles  of  cloud  liquid/ice  water  content.  CloudSat  was 
designed  for  a  22-month  operational  life  and  has  just  exceeded  six  years  on-orbit,  exemplifying  both  a  robust 
system  design  and  a  very  dedicated  mission  operations  team. 

Together  with  CALIPSO,  a  co-manifested  launch  partner  and  a  NASA/CNES  ESSP  satellite,  CloudSat  joined 
the  Afternoon  Constellation  or  “A-Train”  -  an  international  series  of  Earth-observing  satellites  that  follow  nearly 
identical  ground  tracks  and  allow  for  near-simultaneous  observations  of  the  same  terrain  with  a  wide  variety  of 
complementary  instruments.  For  example,  CloudSat’s  ability  to  penetrate  clouds  with  its  Cloud  Profiling  Radar 
(CPR)  is  complementary  to  CALlPSO’s  L1DAR  observations  of  aerosol  interactions  with  clouds.  The  A-Train 
Constellation  is  in  a  near-circular,  sun-synchronous  polar  orbit  with  equatorial  crossings  at  a  local  mean  time  of 
1:30  pm.  At  an  altitude  of  about  705  km,  the  A-Train  satellites  enjoy  almost  65  minutes  of  sunlight  and  34 
minutes  of  eclipse  (subject  to  seasonal  variation  and  orbit  precession  changes)  per  Earth  revolution. 


Figure  1.  CloudSat’s  position  in  the  Afternoon  Constellation  or  A-Train1. 

Mission  management,  satellite  control  authority  and  risk  decision  authority  are  the  responsibility  of  the 
CloudSat  project  office,  located  at  the  Jet  Propulsion  Laboratory  (JPL),  where  the  CPR  was  also  designed, 
integrated  and  tested  in  partnership  and  with  contributions  from  the  Canadian  Space  Agency.  Ball  Aerospace  & 
Technologies  Corporation  (BATC)  designed  and  built  the  spacecraft  bus,  integrated  and  tested  the  space  vehicle 
and  is  currently  providing  technical  operations  and  anomaly  resolution  support.  Through  the  worldwide  Air 
Force  Satellite  Control  Network  (AFSCN),  the  US  Air  Force’s  Research,  Development,  Test  and  Evaluation 
(RDT&E)  Support  Complex  (RSC)  at  Kirtland  Air  Force  Base  provides  round-the-clock  CloudSat  operations, 
mission  engineering  and  ground  system  sustainment. 

II.  CloudSat  Provides  International  Science  Value 

Over  a  thousand  times  more  sensitive  than  any  ground-based  weather  radar,  CloudSat’s  CPR  has  directly 
contributed  to  the  improvement  of  global  weather  models,  as  well  as  our  understanding  of  how  Earth’s  clouds  are 
affected  by  and  influence  climate  change,  filling  a  recognized  and  critical  gap  in  the  measurement  and 
understanding  of  clouds  for  weather  and  climate  research.  The  continuation  of  CloudSat  observations  is  key  in 
determining  the  variability  of  cloudiness  on  intra-seasonal  to  inter-annual  time  scales.  This  helps  the  global 
climate  research  community  determine  the  relationships  between  the  variability  of  clouds  with  precipitation  and 
key  environmental  factors,  establish  important  cloud-climate  feedbacks  and  characterize  changes  on  a  decadal 
time  scale.  The  unexpected  loss  of  CloudSat  data  due  to  the  battery  anomaly  of  April  2011  (Section  IV)  had  a 
significant  influence  on  the  evaluation  and  development  of  improved  cloud  schemes  for  weather  and  climate 
models,  making  its  return  to  operations  and  the  A-Train  a  high  priority. 

The  rich  synergy  of  A-Train  observations  has  extended  the  usefulness  of  CloudSat  observations  and  has 
enabled  important  advances.  Fig  2  demonstrates  how  CloudSat,  together  with  other  A-Train  instruments, 
provided  the  three-dimensional  structure  of  cloud  response  to  an  El  Nino  event  in  February  2010.  This  figure 
shows  tropical  (5°S-5°N)  averaged  CloudSat  cloud  water  content  profiles  in  color  shadings,  Microwave  Limb 
Sounder  (MLS/A-Train)  UT  water  vapor  in  color  contours.  Atmospheric  Infrared  Sounder  (AIRS/A-Train)  500 
hPa  water  vapor  in  the  red  curve,  and  NOAA  sea  surface  temperature  (SST)  in  the  color  map.  The  synergy  of 
CloudSat  with  A-Train  observations  has  provided  our  most  definitive  idea  to  date  of  how  the  atmosphere 
responds  to  El  Nino  climate  variability. 


2 

American  Institute  of  Aeronautics  and  Astronautics 


Copyright  2012  by  Michael  Nayak.  Published  by  the  American  Institute  of  Aeronautics  and  Astronautics,  Inc.,  with  permission. 


A-Train  Satellite  Observations  of  2010  El  Nino  (02/2010) 
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Figure  2.  CloudSat  contributes  to  observations  of  the  response  of  the  atmosphere  to  El  Nino  forcing2 


However,  observations  of  one  or  two  such  El  Nino  events  are  not  sufficient  to  determine  the  range  of 
response  to  climate  variability.  Multi-year  datasets  are  required,  as  tropical-mean  cloud  anomalies  are  not 
linearly  related  to  tropical-mean  SST  changes.  As  seen  in  Fig  3,  during  the  CloudSat  pre-anomaly  operation 
period  (2006-2011),  two  El  Ninos  (2006-07  and  2009-10)  exhibit  nearly  opposite  tropical-mean  cloud  anomalies. 
Continued  observations  of  cloud  profiles  by  CloudSat,  combined  with  the  A-Train,  are  needed  to  quantify  cloud 
response  to  climate  variability,  especially  long-term  cloud  changes  to  the  warming  of  surface  temperature. 


(a)  (b) 


Figure  3.  CloudSat  cloud  water  content  (CWC)  and  change  in  cloud  fraction  (ACFr)  as  a 
function  of  height  in  the  atmosphere  for  two  El  Nino  events  (2006-20007  [pink]  and  2009- 
2010  [red])  that  exhibit  nearly  opposite  tropical-mean  cloud  anomalies2. 


III.  CloudSat  Spacecraft  System 

The  power,  thermal,  fault  protection,  and  attitude  control  subsystems  played  key  roles  in  the  recovery  from 
the  April  2011  battery  anomaly  and  the  realization  of  the  Daylight-Only  Operations  (DO-OP)  mode.  A  brief 
overview  of  each  is  presented  in  this  section. 
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A.  Power  Subsystem  Overview 

CloudSat  uses  a  direct  energy  transfer  power  architecture,  where  two  articulating  solar  arrays  provide  over 
1,000  watts  of  solar  power  to  recharge  a  40  Amp-Hr  battery.  A  Power  Control  Unit  (PCU)  regulates  battery 
charge  control  and  distributes  power  to  spacecraft  components  over  a  hardwired  essential  power  bus  and  four 
switchable  power  buses.4 

B.  Thermal  Control  Subsystem  Overview 

CloudSat’s  thermal  subsystem  is  primarily  passive,  relying  on  multi-layer  insulation  and  thermal  radiators  to 
control  the  temperature  of  the  spacecraft,  but  also  uses  thermostatically  and  manually  controlled  heaters.  Most 
heaters  are  used  to  prevent  components  from  getting  too  cold  in  the  safe  modes,  and  can  be  enabled  and  disabled 
by  ground  command.  However,  several  critical  “survival”  heaters  are  used  to  actively  maintain  components 
within  tight  temperature  ranges,  and  by  design,  cannot  be  externally  disabled. 

C.  Under-Voltage  Fault  Protection  Overview 

CloudSat  has  a  variety  of  independent  fault  protection  schemes.  One  of  these  schemes  involves  a  series  of 
under-voltage  (UV)  protection  levels.  As  the  battery  discharges,  the  system  voltage  decreases;  if  discharged  too 
far,  the  UV  faults  are  sequentially  tripped.  The  response  to  UV  faults  is  to  shed  the  power  buses  according  to  a 
hierarchical  order  to  ensure  that  the  more  essential  components  such  as  the  command  subsystem  and  survival 
heaters  continue  to  receive  adequate  power.  The  fault  protection  design  proved  very  flexible,  and  the  ability  to 
modify  some  of  its  features  proved  essential  to  recovering  the  mission. 

D.  Attitude  Determination  and  Control  Subsystem  Overview 

The  attitude  determination  and  control  subsystem  (ADCS)  provides  three-axis  control  of  the  spacecraft. 
Attitude  knowledge  is  achieved  using  star  trackers,  magnetometers  and  coarse  sun  sensors.  Attitude  control  is 
maintained  using  torque  rods,  reaction  wheels  and  thrusters.4 

IV.  CloudSat  Battery  Anomaly 

In  late  2009,  CloudSat’s  battery  started  showing  initial  signs  of  aging  when  it  suffered  a  soft-short  in  one  of 
the  cells.5  Although  degraded,  battery  capacity  was  sufficient  to  support  CPR  collections  through  eclipse,  but  it 
was  necessary  to  restrict  ground  contacts  to  the  sunlit  portion  of  the  orbit.  Despite  this  mitigation,  the  end-of- 
discharge  voltage  level  often  hovered  close  to  the  first  under-voltage  fault  threshold.  On  17  April  2011,  the  first 
UV  level  fault  was  tripped.  The  team’s  immediate  response  was  to  increase  the  battery  charge  level,  but  despite 
this  adjustment,  within  24  hours  the  satellite  descended  through  the  remaining  UV  levels,  causing  activation  of 
the  Emergency  Mode. 

A.  CloudSat  Enters  Emergency  Mode 

Upon  entering  Emergency  Mode,  the  thrusters  were  fired  to  place  CloudSat  in  a  stable  spin  around  the  X-axis 
and  the  solar  arrays  were  rotated  to  achieve  positive  power  regardless  of  the  spacecraft  orientations  relative  to  the 
Sun.  The  UV  fault  was  tripped  multiple  times  in  the  coming  week  and  the  associated  UV  fault  response 
continued  to  shed  all  but  the  essential  power  bus  each  time  it  was  tripped,  however  the  thrusters  were  never  fired 
again. 

In  this  mode,  and  by  design,  solar  input  to  the  arrays  varies,  but  the  spacecraft  remains  sufficiently  power 
positive  so  that  the  battery  gets  recharged  even  under  the  worst-case  conditions.  This  design,  however,  assumed 
the  battery  would  have  sufficient  capacity  to  support  the  essential  loads,  including  survival  heaters.  When  the 
survival  heaters  were  on,  the  battery  was  unable  to  meet  the  demands  of  even  the  low-power  Emergency  mode. 
Available  data  indicated  the  battery  was  now  only  able  to  supply  10%  of  the  energy  it  had  supplied  a  few  days 
earlier. 

Fig  4  illustrates  the  sinusoidal  charging  caused  by  the  spinning  spacecraft  during  the  sunlit  orbit  (red  line) 
and  the  variation  in  the  rate  of  eclipse  battery  discharge  caused  by  unstable  durations  of  survival  heater  loads 
(green  line).  Unless  a  solution  was  found,  it  would  be  impossible  for  the  battery  to  support  any  additional  loads 
needed  to  recover  the  spacecraft. 


4 

American  Institute  of  Aeronautics  and  Astronautics 


Copyright  2012  by  Michael  Nayak.  Published  by  the  American  Institute  of  Aeronautics  and  Astronautics,  Inc.,  with  permission. 


ECLIPSE 


66500  68500 

Time,  seconds 


Figure  4.  CloudSat  Current-Voltage-Time  Graph 


B.  Characterizing  the  Battery  Anomaly 

Initially  it  was  believed  that  the  battery  had  suffered  another  soft-short.  Further  analysis  determined  the 
battery  was  suffering  from  diffusion-limiting  current ,  a  condition  caused  by  corrosion  of  the  positive  electrode 
which  results  in  the  net  loss  of  electrolyte.10  The  reduced  amount  of  electrolyte  reduces  the  ability  to  support  the 
current  demands  and  there  is  a  sudden  drop  in  voltage  when  the  diffusion-limit  is  reached. 

In  the  event  of  a  cell  experiencing  a  hard  short,  CloudSat’s  design  included  a  spare  two-cell  common  pressure 
vessel  (CPV)  that  could  be  switched  into  the  circuit.  Unfortunately,  this  spare  CPV  could  not  be  used  to  alleviate 
the  diffusion-limit  phenomenon  suffered  in  April  2011,  nor  the  soft-short  of  December  2009  because  in  both 
cases,  the  voltage  of  the  affected  cell  indicates  it  is  healthy  while  being  charged,  but  the  voltage  quickly  drops  to 
full  the  discharged  level  shortly  after  a  load  is  applied.  If  the  spare  CPV  was  connected,  it  would  be  necessary  to 
use  manual  charge  control  to  achieve  the  voltage  level  needed  to  charge  the  battery  with  extra  cells.  Apart  from 
being  operationally  intensive,  adopting  this  would  place  the  spacecraft  at  risk  of  over-voltage  damage,  so  it  was 
decided  to  use  this  option  only  as  a  last  resort. 

The  actions  of  the  fault  protection  system  further  exacerbated  this  problem.  Every  time  the  ground  operators 
increased  the  charge  rate,  the  survival  heater  would  turn  on  in  eclipse,  the  current  limit  would  be  exceeded,  the 
UV  fault  would  trip,  and  in  response,  the  charge  level  would  be  automatically  reduced  to  the  default  level.  If  the 
battery  had  been  reasonably  healthy  this  would  not  have  been  an  issue  because  even  at  this  low  rate  the  battery 
would  have  recharged  sufficiently  to  support  the  loads.  However,  with  the  inability  to  sustain  higher  load  levels, 
primary  heaters  were  shed,  which  dropped  the  temperature  of  the  battery  into  the  survival  range.  At  this 
temperature,  the  battery  capacity  was  further  reduced,  increasing  the  frequency  of  UV  faults  and  creating  a 
recursive  problem. 

The  first  step  to  recovery  was  taken  when  a  method  for  solving  this  issue  was  devised.  The  key  was  utilizing 
the  redundant  power  control  system.  It  provided  a  number  of  “knobs”  that  could  be  turned  to  compensate  for 
anomalous  behavior,  but  when  operating  in  the  hot  back-up  configuration,  the  options  available  were  more 
limited.  The  system  was  therefore  divided  in  half.  One  half  was  assigned  charge  control,  with  UV  protection 
disabled,  while  the  other  half  was  assigned  UV  fault  protection  duties.  Originally,  the  fault  thresholds  had  been 
set  conservatively  above  the  minimum  operating  voltage  of  the  components  to  ensure  a  healthy  reserve.  In  the 
new  configuration,  these  thresholds  were  substantially  lowered  while  retaining  a  higher  charge  rate.  After 
implementing  this  change,  battery  temperature  increased  to  a  more  comfortable  level  and  the  frequency  of  UV 
faults  was  reduced,  giving  the  team  breathing  room  to  explore  recovery  options  adaptively  instead  of  reactively. 

The  next  step  would  be  to  find  a  method  of  managing  heater  loads  in  eclipse.  But  first  the  matter  of 
CloudSat’s  potential  threat  to  other  on-orbit  assets  had  to  be  addressed. 

C.  Exit  from  the  Afternoon  Constellation 

Several  weeks  into  the  recovery  it  became  clear  that  CloudSat  was  drifting  out  of  its  control  box  toward 
AQUA.  If  CloudSat  did  not  take  action,  AQUA  would  be  forced  to  interrupt  its  nominal  operations  and 
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maneuver  out  of  CloudSat’s  way.  The  challenge  was  further  complicated  by  the  spin  rate.  As  shown  in  Fig  5, 
solar,  gravitational,  and  magnetic  torques  were  causing  the  original  spin  rate  of  Emergency  Mode  to  decrease, 
further  endangering  the  already  delicate  power  and  thermal  conditions.  The  team  had  to  re-establish  the  stable 
spin  rate  by  firing  thrusters,  but  this  would  impart  a  small  AV  in  an  unpredictable  direction  -  a  potentially 
dangerous  action  while  still  in  the  A-Train. 
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Figure  5.  Variation  of  Emergency  Mode  spin  rate  early  in  the  recovery 


Other  than  the  battery,  all  systems  on  CloudSat  were  fully  functional,  so  the  Project  made  the  decision  to 
attempt  to  exit  the  A-Train  by  lowering  CloudSat’s  orbit.  No  maneuver  would  be  possible  without  the  Spacecraft 
Computer  (SCC)  staying  on  through  eclipse.  The  team  calculated  a  pre -heating  strategy:  manually  controlled 
heaters  would  be  used  to  add  heat  to  critical  components  during  the  sunlit  portion  of  the  orbit  to  ensure  that  the 
survival  heaters  did  not  turn  on  in  eclipse,  making  it  possible  to  avoid  UV  faults  and  keep  the  SCC  on. 

CloudSat  was  in  a  sun-synchronous  polar  orbit  and  would  head  south-to-north  during  the  sunlit  portion  of  the 
orbit.  At  one  of  the  southernmost  ground  stations,  during  one  10-minute  contact,  the  USAF/RSC  team 
transmitted  all  the  commands  needed  to  conduct  the  maneuver  sequence  to  exit  the  A-Train.  After  loss-of-signal, 
the  SCC  started  executing  the  complex  and  time-consuming  sequence  to  turn  select  heaters  on,  recover  control 
and  execute  the  planned  maneuver  sequence  to  exit  the  A-Train.  Components  were  pre-heated  during  sunlight, 
the  maneuver  sequence  executed  successfully  and  the  spacecraft  returned  nominally  to  Emergency  Mode 
following  the  maneuver. 


D.  Halfway  There:  Escaping  Emergency  Mode 

The  success  with  the  pre-heating  strategy  during  the  A-Train  exit  operation  pointed  the  way  toward 
recovering  from  Emergency  Mode.  It  had  been  shown  that  the  manually  controlled  heaters  could  be  used  to  keep 
the  survival  heaters  off.  With  the  SCC  on,  the  recovery  could  progress.  It  would,  however,  be  necessary  to  cycle 
these  heaters,  as  well  as  nearly  every  other  component,  on  and  off  at  every  eclipse  exit  and  entry,  an 
unsustainably  long  and  arduous  commanding  task.  Fortunately,  the  spacecraft  had  a  built-in  feature  that  had 
previously  gone  unused,  which  allowed  it  to  store  relative-timed  sequences  of  commands  that  could  be  executed 
over  and  over. 

The  spacecraft  was  only  able  to  produce  the  power  need  to  support  the  manual  heaters  in  a  relatively  narrow 
range  of  spin-to-sun  orientations.  Further,  all  of  the  existing  modes  of  operation  were  3-axis  controlled  with  zero 
net  momentum.  Therefore,  without  the  power  to  keep  the  reaction  wheels  on  during  eclipse,  any  residual 
momentum  would  cause  the  spacecraft  to  drift  and  it  was  entirely  possible  the  solar  array  would  be  off -pointed 
enough  that  the  power  demands  would  not  be  met  on  eclipse  exit.  Additionally,  the  spin  rate  could  be  adversely 
affected  by  environmental  torques. 

The  next  step  of  the  recovery  was  to  develop  a  “new  mode”  that  would  hold  the  spacecraft  within  this  narrow 
range  when  it  was  sunlit  and  overcome  the  environmental  torques.  A  further  constraint  on  this  new  mode  was 
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that  it  had  to  point  the  solar  arrays  at  the  sun  at  eclipse  exit  so  the  arrays  could  immediately  supply  all  of  the 
power  needed  by  the  spacecraft  without  any  help  from  the  battery.  Finally,  this  mode  could  not  be  achieved  by 
uploading  new  software  -  time  and  expense  involved  with  this  approach  were  cost  prohibitive. 

E.  Sun  Point  Spin 

The  initial  solution  was  to  develop  a  controlled  spin  mode  known  as  Sun  Point  Spin  (SPS),  shown  in  pictorial 
form  in  Fig  6.  In  this  mode,  the  attitude  control  system  uses  torque-rods  to  maintain  a  constant  spin  rate,  and 
point  the  spin  axis  at  the  Sun.  The  solar  arrays  were  left  in  an  orientation  to  achieve  positive  power  margin.  To 
survive  eclipse,  the  spacecraft  was  put  into  a  low  power  hibernate  mode,  which  included  turning  off  the  torque- 
rods.  Since  the  spacecraft  was  spin  stabilized,  it  maintained  the  desired  orientation  while  it  was  in  eclipse,  and  at 
eclipse  exit,  the  spacecraft  could  be  powered  on  without  fear  of  over  taxing  the  weakened  battery. 


Figure  6.  Diagrammatic  flow  of  Sun  Point  Spin  (SPS)  Mode 

The  preliminary  trial  of  SPS,  which  had  the  spin  axis  pointed  directly  at  the  Sun,  was  not  totally  successful. 
Increased  heater  power  needs  were  being  imposed  -  the  small  cross-section,  as  viewed  by  the  Sun,  was  causing 
the  spacecraft  to  run  cooler.  This  was  solved  by  pointing  the  spin  axis  20  degrees  off  the  Sun,  the  resultant 
increase  in  solar  cross-section  providing  a  good  compromise  between  power  and  thermal  needs. 

V.  CloudSat  Recovery  to  Daylight  Only  Operations 

A.  Momentum-Bias  Point  Mode  (Point-Standby) 

SPS  Mode  showed  that  it  was  possible  to  keep  the  SCC  powered  and  quickly  recover  operations  after  exiting 
eclipse.  The  team  used  these  lessons  to  develop  the  next  mode  in  the  evolutional  chain,  a  Momentum  Biased 
Point  (MBP)  Mode.  The  principle  of  MBP  was  to  store  momentum  in  the  reaction  wheels.  When  the  wheels  were 
turned  off,  this  momentum  would  be  transferred  to  the  body  of  the  spacecraft,  causing  it  to  spin  up,  similar  to 
SPS. 

The  major  constraint  on  MBP  was  that  the  momentum  had  to  be  low  enough  to  be  stored  in  the  wheels  and 
still  meet  pointing  and  maneuver  requirements,  yet  large  enough  to  maintain  the  proper  attitude  through  eclipse. 
Additionally,  this  mode  had  to  be  able  to  flip  the  spacecraft  around  so  that  the  orientation  at  eclipse  entry  was  the 
same  as  it  was  at  eclipse  exit.  This  is  illustrated  in  Fig  7.  As  it  turned  out,  the  existing  Point-Standby  Mode  could 
be  modified  by  changes  to  flight  software  (FSW)  parameter  tables  to  meet  these  requirements.  Point-Standby 
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was  already  designed  to  do  the  yaw  flip;  CloudSat’s  wheels  had  been  designed  with  sufficient  excess  capability 
to  accommodate  this  mode’s  obligatory  momentum  storage. 

While  it  was  relatively  easy  to  make  this  new  mode  work,  it  took  considerable  effort  to  make  it  work  well.  In 
MBP,  the  ADCS  subsystem  stops  the  spin  and  maneuvers  to  point  the  CPR  boresight  at  nadir  shortly  after 
eclipse  exit.  As  the  spacecraft  flies  from  south  to  north,  it  rotates  about  the  boresight  to  point  the  +X-axis  in  the 
direction  of  the  Sun  while  also  rotating  the  solar  arrays  to  point  at  the  Sun.  As  with  all  the  new  modes,  the 
spacecraft  was  placed  into  a  low-power  hibernate  mode  while  in  eclipse. 

B.  Graduating  to  DO-OP  with  a  Preparatory  Mode 

With  the  success  of  MBP,  a  return  to  full  operational  capability  was  within  easy  reach.  Daylight  Only 
Operations  (DO-OP)  mode  would  only  require  the  addition  of  CPR  on/off  cycling  to  the  Point-Standby  sequence. 
There  was,  however,  one  issue  that  had  to  be  resolved  first.  An  important  component  of  CPR  science  data  is  geo¬ 
location,  which  enables  analysis  of  specific  weather  systems  across  certain  areas  or  comparison  with  global 
trends6"9.  In  order  to  make  this  possible,  both  the  GPS  and  the  Solid  State  Recorder  (SSR)  had  to  remain  active 
and  powered  throughout  eclipse;  however,  both  were  powered  by  the  Payload  bus,  which  had  not  been  turned  on 
during  eclipse  yet.  The  loads  were  small  enough  for  the  battery  to  support  through  eclipse,  but  this  bus  also 
powered  two  thermostatically  controlled  “stability”  heaters  that  could  not  be  disabled.  If  triggered  in  eclipse, 
these  heaters  would  almost  certainly  trip  a  UV  fault. 


Heater  Management 

*  Manual  heaters  commanded  on 
while  in  sun 

*  Prevents  thermostatically  controlled 
heaters  from  turning  on  in  eclipse 


Star 

Trackers 

Solar 

Arrays  1__\ 


Transition  to  Hibernate 

*  SC  spins  up  to  constant  rate 

*  Initiated  prior  to  eclipse 
entrance 


Hibernate 

'  Vehicle  spins  about  +X  axis 
'  Low  power  consumption 
’  SC  easily  survives  eclipse 


Dav-Liaht  Operational  Mode 

*  CPR  fully  operational  9.5  min. 
after  eclipse  exit 

*  SC  continuously  yaws  around  nadir 
and  articulates  arrays  to  point  arrays 
at  sun 


Transition  to  DO-Qp  Mode 

*  Initiated  after  eclipse  exit 

*  SC  spins  down  and 
maneuvers  to  point  CPR 
boresight  at  nadir 


Figure  7.  Diagrammatic  flow  of  Standby,  Prep  and  DO-OP  Modes 

Once  again,  the  team  leveraged  the  lessons  learned  from  previous  successes.  The  pre -heating  strategy  was  re¬ 
calculated  with  the  goal  of  creating  extra  heat  from  the  manual  heaters  in  sunlight  and  raising  the  temperature 
above  the  set  point  of  these  heaters.  The  problem  was  that  it  would  take  longer  than  one  orbit  to  reach  this  point. 
The  solution  was  to  add  a  preparatory  mode  -  “PREP”  would  bring  the  spacecraft  up  to  a  full  operational  state 
during  the  sunlit  portion  of  the  orbit,  but  still  keep  the  payload  bus  off  in  eclipse.  This  sequence  would  be 
repeated  until  the  desired  temperature  was  reached,  at  which  point  CloudSat  could  switch  to  DO-OP  mode  and 
be  back  in  the  business  of  gathering  cloud  science  data. 

C.  Making  it  an  Operational  System 

Throughout  the  recovery  process,  the  goal  of  making  CloudSat  truly  operational  once  again  was  never  far 
from  mind.  The  first  step  toward  accomplishing  this  goal  was  to  adapt  the  new  modes  developed  during  the 
recovery  to  provide  a  layered  fault  response  capability.  For  example,  the  system  will  now  autonomously 
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transition  to  the  new  Point-Standby  if  a  UV  fault  is  tripped.  A  variant  of  SPS  called  “Recovery  Mode”  was 
developed  to  respond  to  attitude  rate  and  pointing  errors.  Recovery  Mode  uses  less  power  than  SPS  Mode.  In  a 
poor  power  orientation,  the  spacecraft  will  stay  power  positive  as  it  autonomously  maneuvers  to  an  attitude 
where  the  higher  power  SPS  Mode  can  be  engaged. 

The  ability  to  conduct  AV  maneuvers  was  also  added.  The  first  maneuvers  to  be  added  were  two  collision 
avoidance  (COLA)  sequences,  designed  to  conduct  an  orbit  lower  or  orbit  raise  on  short  notice. 

Currently  the  ability  to  conduct  the  AV  maneuvers  to  re-enter  the  A-Train  is  being  tested  and  implemented. 
This  includes  providing  the  capability  to  conduct  large  inclination  increase  and  decrease  maneuvers,  as  well  as 
small  trim  burns  for  constellation  station-keeping. 

Today,  DO-OP  has  evolved  into  a  fully  operational  flight  mode.  All  the  modes  have  been  standardized  to 
allow  for  easy  switches  and  to  reduce  the  probability  of  errors  in  execution.  Fig  8  below  illustrates  the  newly 
designed  modes  on  CloudSat  and  contrasts  them  with  the  old  modes  at  the  time  of  launch. 


New  Mode 

Old  Mode 

Key  Feature 

Daylight  Only  Operations 
(DO-Op)  a.k.a 
“Momentum  Bias  Point" 

Prep 

Operational 
(Point  Mode) 

-SC  is  fully  operational 
-CPR  taking  data 
-CPR  boresight  pointed  at  nadir 
-SC  maneuvers  around  boresight  to 
point  +X  axis  at  sun 

Standby 

a.k.a.  “MBP-Standby"  or 
“Point  Standby" 

Standby 

(Point-Standby) 
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heritage  Point  Standby  target 
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Acquire  Sun 
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-Solar  Array  rotate  to  +/-  40° 
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No  Equivalent  Mode 
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-Solar  Array  rotated  to  +/-  40° 
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Emergency 

Emergency 

-No  Change 

Figure  8.  Comparison  of  Old  and  New  CloudSat  Flight  Modes 
D.  Evaluation  of  success  of  current  CONOPS 

Once  the  spacecraft  demonstrated  it  could  support  operations  in  the  DO-OP  mode,  the  team’s  efforts  focused 
on  recovering  the  CPR.  The  first  power-on  of  the  instrument  since  the  anomaly  occurred  on  Sept  28,  2011. 
Through  much  of  October  2011,  the  CPR  was  incrementally  powered  on,  repeating  the  check-out  sequence  used 
post-launch.  Once  all  elements  of  the  CPR  were  verified  to  be  healthy,  it  was  powered  on  for  successively  longer 
periods  of  time,  until  it  was  consistently  operating  for  54  minutes  each  orbit.  Following  a  final  adjustment  in 
pointing,  the  project  office  at  JPL  declared  CloudSat  to  be  fully  operational  in  November  2011. 

Successfully  collecting  radar  data  again,  CloudSat  could  now  return  to  its  original  orbit  in  the  A-Train.  Just 
like  all  other  operations,  maneuvers  had  to  be  conducted  during  the  sunlit  portion  of  the  orbit.  This  constraint 
would  place  consistent  challenges  on  the  USAF-led  Mission  Engineers  at  the  RSC;  however,  with  the  anomaly 
experience  gained  from  the  past  six  months,  they  were  up  to  the  task.  The  first  demonstration  of  maneuver 
capability  occurred  in  October  2011,  when  CloudSat  successfully  conducted  a  maneuver  to  avoid  a  potential 
collision  with  a  piece  of  space  debris.  With  this  demonstration  of  maneuverability,  the  Project  had  the  confidence 
to  recommend  to  NASA  that  CloudSat  be  allowed  to  return  to  the  A-Train.  A  plan  for  this  return  is  in  place  and 
currently  executing  toward  completion. 

VI.  Lessons  Learned  from  the  CloudSat  Anomaly  Recovery 

With  the  bulk  of  the  anomaly  response  and  recovery  complete,  our  attention  now  turns  to  operational  lessons 
learned.  Over  six  months  of  contingency  commanding  and  frequent  setbacks,  a  lot  was  learned  regarding  mission 
management  of  an  anomalous  R&D  satellite. 
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A.  Keys  to  Success 

From  start  to  finish,  this  recovery  was  a  team  effort.  This  section  presents  some  of  the  team’s  keys  to  success, 
without  which  CloudSat  would  not  be  operational  today. 

1.  C ommunication  -  Effective  communication  between  a  geographically  separated  team  was  critical  to  keeping 

everyone  on  the  same  page.  Open  teleconferences  were  conducted  for  hours  at  a  time.  Subsystem  engineers 
were  working  to  understand  and  characterize  the  failure,  and  operators  were  focused  on  capturing  critical 
engineering  telemetry  from  the  spacecraft  to  assist  in  the  characterization.  Though  there  were 
disagreements,  things  could  be  easily  resolved  due  to  the  open  lines  of  communication. 

2.  Involvement  -  Operations  concepts  were  being  developed  at  BATC,  decisions  were  being  made  by  JPL,  and 

implementation  being  done  by  operators  at  the  RSC.  All  of  the  organizations  (BATC,  JPL  and  RSC) 
reviewed  the  plans  prior  to  upload  to  the  spacecraft.  An  important  part  to  the  operators’  success  was 
keeping  the  entire  team  involved  at  a  technical  level.  An  understanding  of  the  reasons  for  modified 
operations  or  higher  importance  to  certain  telemetry  values  kept  everyone  knowledgeable  and  fostered  a 
high  degree  of  collaboration. 

3.  Co-Location  -  JPL  Project  Managers  and  experts  were  imbedded  with  the  spacecraft  team  at  BATC  and  the 

JPL  Flight  Director  was  imbedded  with  the  operations  team  at  the  RSC.  This  facilitated  face-to-face 
interaction  to  ensure  smooth  communication  and  rapid  decision  making.  Just  as  importantly,  it  fostered  a 
better  understanding  of  the  needs  and  constraints  of  the  different  organizations  during  difficult  and  often 
stressful  times. 

4.  Creativity  -  Due  to  the  severity  of  the  battery  limitations,  engineers  at  BATC  were  challenged  to  come  up 

with  a  new  and  innovative  ways  to  operate  the  spacecraft.  The  BATC  team  deserves  much  credit  for 
conceiving  and  implementing  new  approaches  to  fault  protection  and  momentum  management  that  today 
allows  the  spacecraft  to  reduce  loads  during  eclipse  and  emerge  into  the  sun  with  the  solar  arrays 
immediately  receiving  energy.  Another  example  is  the  ground  limitations  that  stymied  the  RSC  from 
uploading  blocks  containing  more  than  540  commands.  Several  of  the  new  modes  require  larger  blocks,  but 
once  again,  creativity  reigned.  RSC  Mission  Engineers  developed  new  command  management  techniques 
that  allowed  larger  blocks  of  commands  to  be  uploaded  given  the  same  contact  duration. 

5.  Urgency  -  On  occasion,  due  to  either  RSC  problems  or  antenna  restrictions  at  the  ground  site,  the  vehicle  has 

“faded  hot”,  i.e.,  gone  over  the  horizon  with  the  transmitter  on.  Rapid  response  by  RSC  Mission  Operators 
to  realize  the  error,  add  up  an  emergency  contact,  generate  the  pass  plan  and  bring  up  the  contact  within 
minutes  prevented  setbacks  by  eclipse  UV  faults. 

6.  Flexible  Assets  - 

a.  AFSCN:  As  an  R&D  satellite  (lower  priority  than  operational  satellites),  CloudSat  was 
vulnerable  to  being  bumped  from  scheduled  AFSCN  contacts.  However,  when  needed,  RSC 
Mission  Engineers  and  Operators  were  able  to  successfully  lobby  for  extra  contacts  and 
effectively  and  efficiently  responded  to  the  challenges  posed  by  sudden  faults,  COLA  events, 
and  short  notice  changes  in  plans.  The  worldwide  reach  of  the  AFSCN  assets,  together  with  the 
flexibility  of  the  staff,  were  both  indispensable  to  the  recovery. 

b.  Ground  System  Architecture:  In  addition,  having  a  thoroughly  designed  and  vetted  ground 
system  that  was  capable  of  distributing  telemetry  and  trending  data  within  minutes  of  a  contact 
was  crucial  to  BATC  engineers  being  able  to  understand  and  characterize  the  spacecraft 
behavior  immediately  and  prevent  degeneration  of  recovery  efforts. 

c.  Flight  Software  Test  Bench:  The  team  was  generating  completely  new  operating  modes.  Had  a 
similar  situation  occurred  before  launch,  the  magnitude  of  changes  that  were  necessary  would 
probably  have  taken  6-9  months  of  ground  testing  alone.  The  availability  of  a  software  test- 
bench  to  evaluate  and  validate  performance  of  the  new  modes  was  quintessential.  Using  this 
bench  also  enabled  developers  at  BATC  and  operators  at  the  RSC  to  speak  the  same  language 
and  carefully  coordinate  actions,  managing  and  helping  mitigate  the  risk  of  game -ending 
commanding  errors. 

7.  Veteran  Team  -  As  soon  as  the  anomaly  occurred.  Project  management,  with  the  support  of  BATC,  JPL  and 

USAF  management,  was  able  to  quickly  pull  in  experienced  resources  from  other  areas  to  assist  in  the 
anomaly  resolution.  This  included  veteran  CloudSat  engineers  who  were  familiar  with  the  spacecraft 
systems  and  experienced  operators  who  were  familiar  with  the  operations. 

8.  Risk  Management  -  Given  that  the  spacecraft  was  in  serious  jeopardy,  creative  engineering  solutions  were 

needed,  but  risk  had  to  be  balanced  each  step  of  the  way.  However,  by  far  the  biggest  risk  was  indecision: 
not  responding  in  a  timely  fashion  would  have  lost  the  battery  altogether  due  to  insufficient  charging  and 
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decreasing  temperatures.  Risk  mitigation  became  a  three-step  process:  Internal  team  checks,  test  bench  runs 
and  on-orbit  validation.  There  was  also  a  human  element  to  risk  management.  Care  had  to  be  taken  to 
ensure  that  communication  was  effective  amongst  team  members  and  between  physical  locations:  JPL 
(California),  BATC  (Colorado)  and  the  RSC  (New  Mexico).  It  was  also  important  to  appreciate  that  a  tired 
team  can  make  mistakes,  so  managing  the  human  factors  was  also  a  key  to  our  success. 

B.  Team  Takeaways 

Once  thought  to  be  on  the  brink  of  satellite  end  of  life  (EOL)3,  the  CloudSat  team  was  able  to  demonstrate  the 
possibilities  of  being  flexible  with  re-defining  CONOPS  in  the  name  of  saving  a  multi-hundred-million  dollar 
mission.  It  is  hoped  that  our  lessons  and  takeaways  may  be  useful  to  other  mission  personnel  that  find  themselves 
in  an  analogous  situation. 

1 .  Understanding  -  Never  give  up  on  trying  to  understand  the  problem.  Despite  many  meetings  and 

teleconferences  with  experts,  it  took  approximately  four  months  to  fully  understand  the  anomalous  battery 
behavior  and  its  discharge  response  during  eclipse. 

2.  Scrutiny  -  CloudSat  is  now  flying  and  operating  in  a  mode  in  which  it  was  never  intended.  This  “new 

normal”  continues  to  challenge  the  team  as  they  learn  nuances  about  the  way  the  spacecraft  behaves  in  DO- 
OP.  Performing  even  the  simplest  maneuvers  or  implementing  the  smallest  change  to  onboard  sequences 
can  impart  un-expected  results.  The  team  must  continue  to  scrutinize  changes,  anticipate  unintended  results 
and  test,  test,  test  prior  to  implementation.  In  similar  situations,  management,  engineers  and  operators  alike 
are  all  encouraged  to  “push  the  alert  button”  if  anything  is  unclear  or  uncertain.  Resources  such  as  the  FSW 
test-bench  are  absolutely  essential  for  such  efforts. 

3.  Risk  -  Don’t  be  afraid  to  accept  risk  in  the  face  of  recovering  from  a  serious  anomaly.  Often  times  a 

decision  has  to  be  made  despite  not  having  all  the  desired  information.  Depending  on  the  situation,  inaction 
can  cost  a  higher  price. 

4.  Luck  -  Don’t  be  too  quick  to  downplay  the  luck  factor,  both  good  and  bad.  We  were  lucky  that  the  battery 

had  just  enough  capacity  left  to  support  the  DO-OP  mode.  We  were  lucky  that  we  didn’t  lose  the  spacecraft 
altogether  while  we  were  fighting  to  regain  control  in  the  first  few  weeks.  We  were  lucky  the  spacecraft 
survived  during  a  time  of  an  extended  ground  system  outage  at  the  RSC,  leaving  us  unable  to  contact  it  for 
days.  There  will  always  be  unknown  unknowns  and,  much  as  we  would  like  to  take  complete  credit  for  the 
successful  recovery,  luck  had  a  hand  to  play  in  it  too. 

5.  Staffing  -  There  will  always  be  times  that  short  notice  work  comes  up  in  anomalous  situations.  Over  an 

extended  timeframe,  this  could  sometimes  be  a  problem.  If  the  situation  permitted,  the  operators  requested 
advanced  notice  of  late  work  and  staggered  their  shifts  to  insure  coverage.  Ultimately  this  reduced  the  risk 
to  operations,  because  building  multiple  new  commands  on  a  daily  basis  was  highly  labor  intensive  and  if 
done  wrong,  could  negate  all  the  good  work  done  to  date.  Given  notice  and  manning  leeway,  the  RSC  team 
was  also  able  to  develop  several  automated  scripts  to  remove  much  of  this  labor  and  risk. 

6.  Mission  Assurance  -  The  team  had  neither  the  time  nor  the  resources  to  conduct  extensive  reviews.  Given 

the  urgency  of  a  degrading  situation,  such  actions  can  slow  if  not  set  back  a  recovery  effort.  NASA  HQ 
entrusted  JPL  and  the  Project  team  with  the  recovery  effort,  which  was  a  critical  enabling  factor.  More 
importantly,  with  that  trust  comes  the  responsibility  of  the  operational  management  to  do  it  right:  while 
several  decisions  were  made  on  the  go  at  the  Project  level,  independent  review  teams  were  brought  in  at 
crucial  junctures  for  validation,  to  include  the  JPL  Office  of  the  Chief  Engineer,  the  JPL  Associate  Director 
for  Flight  Projects  &  Mission  Success  and  the  JPL  Office  of  Safety  and  Mission  Assurance. 

VII.  Conclusion 

In  November  2011,  NASA/JPL  declared  CloudSat  fully  operational  in  the  DO-OP  Mode  per  the  revised 
CONOPS.  The  spacecraft  cycles  its  subsystems  on  and  off  in  Sun  and  eclipse  portions  of  the  orbit  via  weekly 
command  sequences  from  the  RSC.  CloudSat  is  collecting  science  data  during  the  sunlit  portions  of  the  orbit, 
below  the  A-Train,  and  hibernates  in  a  stable  spin  during  eclipse,  to  recover  and  return  to  point  at  the  sun  as  it 
emerges  from  the  dark  side  of  the  Earth.  This  new  CONOPS  requires  constant  care  and  monitoring  of  the 
thermal  and  power  profiles,  as  well  as  more  intensive  commanding  for  the  CloudSat  operators.  Though  CloudSat 
will  never  be  a  fully  nominal  mission  again,  it  is  collecting  data  for  54  out  of  the  65  sunlit  minutes  in  its  orbit, 
and  the  Cooperative  Institute  for  Research  in  the  Atmosphere  (CIRA)  has  begun  distribution  of  science  data  to 
the  CloudSat  community  once  again.  DO-OP  is  in  use  today,  and  maneuvers  are  currently  being  executed  to 
return  CloudSat  to  the  A-Train,  where  it  will  fly  88  along-track  seconds  behind  CALIPSO  and  resume  its  role  in 
the  A-Train  constellation. 
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Appendix  A 
Acronym  List 


ADCS 

Attitude  Determination  and  Control  System 

AFSCN 

Air  Force  Satellite  Control  Network 

AIRS 

Atmospheric  Infrared  Sounder 

AOS 

Acquisition  of  Signal 

BATC 

Ball  Aerospace  Technologies  Corporation 

CALIPSO 

Cloud  Aerosol  LIDAR  &  Infrared  Pathfinder  Satellite  Observations 

CIRA 

Cooperative  Institute  for  Research  in  the  Atmosphere 

CNES 

Centre  National  d’Etudes  Spatiales 

COLA 

Collision  Avoidance 

CONOPS 

Concept  of  Operations 

CPR 

Cloud  Profiling  Radar 

CPV 

Constant  Pressure  Vessel 

CSA 

Canadian  Space  Agency 

CSM 

Command  Storage  Memory 

CWC 

Cloud  W ater  Content 

DO-OP 

Daylight  Only  Operations 

EOL 

End  of  Life 

ESSP 

Earth  System  Sciences  Pathfinder 

FSW 

Flight  Software 

ICV 

Initial  Condition  Vector 

JPL 

Jet  Propulsion  Laboratory 

MBP 

Momentum  Bias  Point 

MLS 

Microwave  Limb  Sounder 

NASA 

National  Aeronautics  and  Space  Administration 

PCU 

Power  Control  Unit 

RDA 

Risk  Decision  Authority 

RDT&E 

Research,  Development,  Test  and  Evaluation 

RSC 

RDT&E  Support  Complex 

SCA 

Spacecraft  Control  Authority 

see 

Spacecraft  Computer 

SPS 

Sun  Point  Spin 

SSR 

Solid  State  Recorder 

SST 

Sea  Surface  Temperature 

USAF 

United  States  Air  Force 

UV 

Under  Voltage 

VT 

Voltage-temperature  (Battery  charging  algorithm) 
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